Hash Based Parallel Algorithms for Mining Association Rules
نویسندگان
چکیده
In this paper, we propose four parallel algorithms (NPA, SPA, HPA and HPA-ELD) for mining association rules on shared-nothing parallel machines to improve its performance. In NPA, candidate itemsets are just copied amongst all the processors, which can lead to memory over ow for large transaction databases. The remaining three algorithms partition the candidate itemsets over the processors. If it is partitioned simply (SPA), transaction data has to be broadcast to all processors. HPA partitions the candidate itemsets using a hash function to eliminate broadcasting, which also reduces the comparison workload signi cantly. HPA-ELD fully utilizes the available memory space by detecting the extremely large itemsets and copying them, which is also very e ective at attering the load over the processors. We implemented these algorithms in a sharednothing environment. Performance evaluations show that the best algorithm, HPA-ELD, attains good linearity on speedup ratio and is e ective for handling skew.
منابع مشابه
An Incremental Mining Algorithm for Association Rules Based on Minimal Perfect Hashing and Pruning
In the literatures, hash-based association rule mining algorithms are more efficient than Apriori-based algorithms, since they employ hash functions to generate candidate itemsets efficiently. However, when the dataset is updated, the whole hash table needs to be reconstructed. In this paper, we propose an incremental mining algorithm based on minimal perfect hashing. In our algorithm, each can...
متن کاملIntroducing an algorithm for use to hide sensitive association rules through perturb technique
Due to the rapid growth of data mining technology, obtaining private data on users through this technology becomes easier. Association Rules Mining is one of the data mining techniques to extract useful patterns in the form of association rules. One of the main problems in applying this technique on databases is the disclosure of sensitive data by endangering security and privacy. Hiding the as...
متن کاملDiscovering Association Rules Change from Large Databases
Discovering association rules and association rules change (ARC) from existing large databases is an important problem. This paper presents an approach based on multi-hash chain structures to mine association rules change from large database with shorter transactions. In most existing algorithms of association rules change, the mining procedure is divided into two phases, first, association rul...
متن کاملIARMMD: A Novel System for Incremental Association Rules Mining from Medical Documents
This paper presents a novel system for Incremental Association Rules Mining from Medical Documents (IARMMD). The system concerns with maintenance of the discovered association rules and avoids redoing the mining process on whole documents during the updating process. The design of the system is based on concepts representation. It designed to develop our previous D-EART system. The IARMMD impro...
متن کاملAn Improved Technique Of Extracting Frequent Itemsets From Massive Data Using MapReduce
The mining of frequent itemsets is a basic and essential work in many data mining applications. Frequent itemsets extraction with frequent pattern and rules boosts the applications like Association rule mining, co-relations also in product sale and marketing. In extraction process of frequent itemsets there are number of algorithms used Like FP-growth,E-clat etc. But unfortunately these algorit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996